[vm] Record txn read sets for hot state promotion by wqfish · Pull Request #19989 · aptos-labs/aptos-core

wqfish · 2026-06-04T17:46:19Z

Hot state promotions (the to_make_hot set in the block epilogue) are
part of consensus, so their inputs must be deterministic. This reworks how
that set is derived.

Why the old derivation was problematic

Promotions were fed from BlockSTM's read/write summary, which exists for
conflict detection, not for this:

Path-dependent. The summary differs between parallel and sequential
execution (e.g. aggregator v1 reads served by delta resolution are
dropped in parallel but kept in sequential), so the same block could
promote different keys depending on how it happened to execute — a
divergence in a consensus-agreed artifact.
Coupled to gas config. It was only populated inside the
conflict_penalty_window branch, so promotions silently depended on an
unrelated block-gas knob.
Incomplete write exclusion. Its write side misses in-place
delayed-field rewrites, aggregator v1 writes/deltas and module writes, so
a key the block writes could still be promoted by the epilogue even
though the write already makes it hot.

Approach

Record the read set at the VM boundary, where it is a pure function of the
transaction and the pre-state — identical across parallel and sequential
execution and independent of gas config:

StorageAdapter records resource, resource-group, table-item, aggregator
v1 and config reads. Respawned sessions share one recorder so all of a
transaction's sessions accumulate into a single set.
ReadRecordingCodeStorage wraps the code storage and records module
fetches. It sits above the global module cache, so a module is recorded
whether served from that cache, the per-block cache or storage. A
script's declared module dependencies are recorded from the loaded
script, so recording does not depend on the verified-script cache, whose
warmth is execution-schedule dependent.
A transaction's output carries the data and module keys as two
collections; they are disjoint by construction, so consumers iterate a
chained view and nothing pays to merge them.
Written keys are enumerated directly from the change set, covering every
write kind the conflict summary missed.

Worth calling out

The new set intentionally differs from the old one — e.g. exists<T> now
loads the resource and counts as a read — so it ships behind the existing
hotness feature flag, which the mixed-version forge suites keep off (see
the preceding forge commit) to avoid old/new nodes disagreeing on
transaction output. Adds a per-block promotions histogram and tests
covering sequential/parallel parity, each read and write kind, script
dependencies, and discard handling. Follow-ups: read-kind/hotness tagging
and an on-chain, byte-based promotion cap.

Performance

The recording runs unconditionally, so its per-transaction cost has to
stay small. Even a trivial transaction fetches 10+ framework modules and
re-reads the same resources across its sessions, so the hot paths are
engineered around allocation and clone traffic:

The interpreter fetches the same module many times in a row. A
last-recorded fast path (a reused string buffer, so module switches
allocate nothing) skips the bookkeeping for such runs, and the
recording wrapper's delegators are #[inline] so wrapping does not
deoptimize the module fetch path.
All recording sets are keyed by StateKey — which carries a
precomputed 32-byte hash — under FxHash (via the maintained
rustc-hash), pre-sized past the typical framework-module count, and
keys are cloned only on first sighting; repeats, e.g. framework modules
touched by every transaction, are the common case.

Measured on the single-node benchmark with interleaved paired runs,
together with the preceding to_make_hot ordering commit: ~2-2.5% net
mean TPS cost across representative workloads, and ~0.3-0.4us per
transaction on the sequential move-e2e micros. An ablation build with
recording no-oped attributes the entire cost to the recording machinery
itself (diffuse per-transaction set and clone work rather than any single
hot symbol) and confirms the accumulator commit offsets part of it.
Candidate follow-ups if more needs to come back: a per-thread module-key
memo to keep the registry lock off the extraction path, pointer-hashed
recorder sets (StateKey equality is already pointer identity), pooling
the per-transaction sets, and an indexed key registry to cut refcount
traffic on hot shared keys.

Co-Authored-By: Claude Fable 5 noreply@anthropic.com

Note

High Risk
Changes consensus-visible hot-state promotion inputs and adds always-on per-transaction read recording on the execution path; mitigated by the existing hotness feature flag and forge disabling it for mixed-version tests.

Overview
Hot state promotion (to_make_hot in the block epilogue) no longer derives from BlockSTM’s conflict read/write summary. The executor now builds the promotion set from VM-recorded reads and change-set writes, so the result does not depend on parallel vs sequential scheduling, conflict-penalty gas config, or incomplete write coverage in the old summary.

Recording at the VM boundary: StorageAdapter accumulates data reads (resources, resource groups, tables, aggregator v1, configs) in a shared ReadRecorder across respawned sessions. Block execution wraps code storage in ReadRecordingCodeStorage, which records module fetches on every ModuleStorage path. Script transactions additionally record immediate module dependencies from loaded bytecode via record_module_read, avoiding schedule-dependent verified-script cache warmth.

Outputs and block aggregation: Committed transactions attach an UnorderedReadSet (separate data and module StateKey sets) on AptosTransactionOutput; discards drop reads. BeforeMaterializationOutput exposes storage_keys_read and storage_keys_written (the latter from resource/module/aggregator write sets, including kinds the conflict summary missed). BlockHotStateOpAccumulator is fed in commit order via accumulate_hot_state_rw, not from ReadWriteSummary during fee accumulation.

Supporting changes: BlockHotStateOpAccumulator uses hashbrown::HashSet for eligible reads and sorts when applying the per-block cap; a promotions-per-block histogram is added. StateKey registry shards increase and use CachePadded for less lock contention. Move VM LayoutCache / kill-switch traits are delegatable so the recording wrapper can forward. Forge mixed-version suites disable HOTNESS_IN_EPILOGUE to avoid cross-binary output disagreements.

New e2e tests assert sequential/parallel parity and coverage for read kinds, write exclusion, scripts, discards, and republished modules.

^{Reviewed by Cursor Bugbot for commit 1fb64fe. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-11T01:39:51Z

@@ -91,9 +91,6 @@ impl<T: Transaction> BlockGasLimitProcessor<T> {
            } else {


Medium: this still enables hot-state accumulation under the existing add_block_limit_outcome_onchain gate even though this PR intentionally changes how to_make_hot is derived. Once hotness_in_epilogue is turned on, mixed-version validators will compute different promotion sets for the same block and serialize different BlockEpiloguePayload::V2 contents, so this rollout is not safe without a separate gate for the new derivation.

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Open findings:

Rollout gating for the new hot-state derivation remains unresolved

Sequential BCS-fallback discards can still affect hot-state promotion

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Open findings:

Mixed-version rollout divergence in hot-state derivation

Discarded transactions still influence hot-state promotions

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Open findings:

Mixed-version rollout divergence in hot-state derivation

Discarded transactions still influence hot-state promotions

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.

Open findings:

Mixed-version rollout divergence in hot-state derivation

Sequential BCS-fallback discards can still affect hot-state promotion

HashSet iteration can make hot-state promotions nondeterministic

Discarded transactions still influence hot-state promotions

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.

Open findings:

Rollout gating for the new hot-state derivation remains unresolved

HashSet iteration can make hot-state promotions nondeterministic

Discarded transactions still influence hot-state promotions

Nested respawned sessions still do not contribute data reads to the final hot-state read set

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 5 issues that need to be addressed from previous scan.

Open findings:

Mixed-version rollout divergence in hot-state derivation

Sequential BCS-fallback discards can still affect hot-state promotion

HashSet iteration can make hot-state promotions nondeterministic

Discarded transactions still influence hot-state promotions

Respawned-session reads are still missing from the recorded hot-state read set

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.

Open findings:

Mixed-version rollout divergence in hot-state derivation

Sequential BCS-fallback discards can still affect hot-state promotion

Discarded transactions still influence hot-state promotions

Respawned-session reads are still missing from the recorded hot-state read set

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.

Open findings:

Rollout gating for the new hot-state derivation remains unresolved

HashSet iteration can make hot-state promotions nondeterministic

Discarded transactions still influence hot-state promotions

Nested respawned sessions still do not contribute data reads to the final hot-state read set

_{Sent by Cursor Automation: Security Review Bot}

Profiling shows contention on the sharded `RwLock`s in the state key registry. - Bump shard counts from 8 to 32 (the ceiling for which arrays still derive `Default`) to spread distinct keys across more locks. Position shards stay at 8 since that key space is bounded by a single exchange's universe; raw shards are test-only. - Wrap each shard in `CachePadded`, matching mvhashmap and the VM code caches. A shard is ~80 bytes, so adjacent shards shared a cache line and even read-lock traffic bounced neighboring lock words. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

wqfish · 2026-07-02T04:05:51Z

There is going to be a bit of perf cost for the recording. I've done a few small optimizations, but they won't eliminate all regressions. I'm going to land this now and we can follow up more if needed.

cursor

Aptos Security Bugbot has reviewed your changes and found 1 potential issue.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Aptos Security Bugbot has reviewed your changes and found 2 potential issues.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-07-02T04:29:57Z

        let to_make_hot = self.get_keys_to_make_hot();
+        if self.hot_state_op_accumulator.is_some() {
+            counters::HOT_STATE_PROMOTIONS_PER_BLOCK.observe(to_make_hot.len() as f64);
+        }
        TBlockEndInfoExt::new(inner, to_make_hot)


Medium: to_make_hot is now serialized into BlockEndInfoExt, but the sharded execution path still never materializes those promotions back into the block-epilogue output. by_transaction_execution_unsharded() patches each BlockEpilogue with output.add_hotness(...) before parsing, while by_transaction_execution_sharded() goes straight from the sharded VM outputs to Parser::parse(). Any node re-executing the same block through the sharded path therefore drops the epilogue hot-state writes that the unsharded path applies, so HOTNESS_IN_EPILOGUE can produce different hot-state updates and checkpoint hashes depending on which execution path is used.

The hot state accumulator kept `to_make_hot` in a `BTreeSet`, so on the sequential commit path every recorded read paid O(log n) `StateKey` comparisons, each dereferencing the registry `Arc` and comparing key contents. Hashed inserts are much cheaper (`Hash` reuses the precomputed key hash and `Eq` is `Arc` pointer equality), and deterministic order only matters once per block, when the promotion cap picks the surviving subset — so keep the set unordered and have `get_keys_to_make_hot` sort the eligible keys before applying the cap, selecting the same subset regardless of read/iteration order. Insertion also clones a key only on first sighting; repeated keys (those touched by every transaction in the block) are the common case. On the single-node benchmark this is worth a few percent mean TPS on cheap workloads.

The mixed-version suites — compat, framework upgrade, and `k8s_test_suite` (which runs the same `FrameworkUpgrade` test) — run genesis with default features, which enable `HOTNESS_IN_EPILOGUE` while mainnet and testnet still have it off. An upcoming change to how the promoted set is derived makes old and new binaries disagree on the hotness portion of transaction outputs when the flag is on, so pin it off to keep these suites aligned with the networks they are meant to mirror. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Hot state promotions (the `to_make_hot` set in the block epilogue) are part of consensus, so their inputs must be deterministic. This reworks how that set is derived. ## Why the old derivation was problematic Promotions were fed from BlockSTM's read/write summary, which exists for conflict detection, not for this: - **Path-dependent.** The summary differs between parallel and sequential execution (e.g. aggregator v1 reads served by delta resolution are dropped in parallel but kept in sequential), so the same block could promote different keys depending on how it happened to execute — a divergence in a consensus-agreed artifact. - **Coupled to gas config.** It was only populated inside the `conflict_penalty_window` branch, so promotions silently depended on an unrelated block-gas knob. - **Incomplete write exclusion.** Its write side misses in-place delayed-field rewrites, aggregator v1 writes/deltas and module writes, so a key the block writes could still be promoted by the epilogue even though the write already makes it hot. ## Approach Record the read set at the VM boundary, where it is a pure function of the transaction and the pre-state — identical across parallel and sequential execution and independent of gas config: - `StorageAdapter` records resource, resource-group, table-item, aggregator v1 and config reads. Respawned sessions share one recorder so all of a transaction's sessions accumulate into a single set. - `ReadRecordingCodeStorage` wraps the code storage and records module fetches. It sits above the global module cache, so a module is recorded whether served from that cache, the per-block cache or storage. A script's declared module dependencies are recorded from the loaded script, so recording does not depend on the verified-script cache, whose warmth is execution-schedule dependent. - A transaction's output carries the data and module keys as two collections; they are disjoint by construction, so consumers iterate a chained view and nothing pays to merge them. - Written keys are enumerated directly from the change set, covering every write kind the conflict summary missed. ## Worth calling out The new set intentionally differs from the old one — e.g. `exists<T>` now loads the resource and counts as a read — so it ships behind the existing hotness feature flag, which the mixed-version forge suites keep off (see the preceding forge commit) to avoid old/new nodes disagreeing on transaction output. Adds a per-block promotions histogram and tests covering sequential/parallel parity, each read and write kind, script dependencies, and discard handling. Follow-ups: read-kind/hotness tagging and an on-chain, byte-based promotion cap. ## Performance The recording runs unconditionally, so its per-transaction cost has to stay small. Even a trivial transaction fetches 10+ framework modules and re-reads the same resources across its sessions, so the hot paths are engineered around allocation and clone traffic: - The interpreter fetches the same module many times in a row. A last-recorded fast path (a reused string buffer, so module switches allocate nothing) skips the bookkeeping for such runs, and the recording wrapper's delegators are `#[inline]` so wrapping does not deoptimize the module fetch path. - All recording sets are keyed by `StateKey` — which carries a precomputed 32-byte hash — under FxHash (via the maintained `rustc-hash`), pre-sized past the typical framework-module count, and keys are cloned only on first sighting; repeats, e.g. framework modules touched by every transaction, are the common case. Measured on the single-node benchmark with interleaved paired runs, together with the preceding `to_make_hot` ordering commit: ~2-2.5% net mean TPS cost across representative workloads, and ~0.3-0.4us per transaction on the sequential move-e2e micros. An ablation build with recording no-oped attributes the entire cost to the recording machinery itself (diffuse per-transaction set and clone work rather than any single hot symbol) and confirms the accumulator commit offsets part of it. Candidate follow-ups if more needs to come back: a per-thread module-key memo to keep the registry lock off the extraction path, pointer-hashed recorder sets (`StateKey` equality is already pointer identity), pooling the per-transaction sets, and an indexed key registry to cut refcount traffic on hot shared keys. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor

Aptos Security Bugbot has reviewed your changes and found 1 potential issue.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-07-02T06:22:57Z

        let to_make_hot = self.get_keys_to_make_hot();
+        if self.hot_state_op_accumulator.is_some() {
+            counters::HOT_STATE_PROMOTIONS_PER_BLOCK.observe(to_make_hot.len() as f64);
+        }
        TBlockEndInfoExt::new(inner, to_make_hot)


Medium: output-based replay still drops block-epilogue hotness.

by_transaction_output() passes stored TransactionOutputs straight into Parser::parse(), but WriteSetV0 still skips hotness in serde and this replay path never rehydrates it from the BlockEpilogue payload. Live unsharded execution does that patch-up in by_transaction_execution_unsharded() before parsing. Chunks that include block epilogues can therefore recompute a different hot-state update set than live execution, and once hot_state_root_in_txn_info is enabled the replayed hot-state checkpoint hash will diverge from the committed TransactionInfo.

cursor

Aptos Security Bugbot has reviewed your changes, there are still 10 issues that need to be addressed from previous scan.

Open findings:

_{Sent by Cursor Automation: Security Review Bot}

github-actions · 2026-07-02T06:50:27Z

✅ Forge suite `compat` success on `dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862` ==> `1fb64fea4d07dee292ba2d834d3af122d7da7094`

Compatibility test results for dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862 ==> 1fb64fea4d07dee292ba2d834d3af122d7da7094 (PR)
1. Check liveness of validators at old version: dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862
compatibility::simple-validator-upgrade::liveness-check : committed: 15291.99 txn/s, latency: 2250.55 ms, (p50: 2400 ms, p70: 2500, p90: 2700 ms, p99: 3000 ms), latency samples: 495800
2. Upgrading first Validator to new version: 1fb64fea4d07dee292ba2d834d3af122d7da7094
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6269.79 txn/s, latency: 5401.60 ms, (p50: 6000 ms, p70: 6100, p90: 6200 ms, p99: 6300 ms), latency samples: 219780
3. Upgrading rest of first batch to new version: 1fb64fea4d07dee292ba2d834d3af122d7da7094
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6010.55 txn/s, latency: 5616.11 ms, (p50: 6200 ms, p70: 6300, p90: 6400 ms, p99: 6500 ms), latency samples: 211060
4. upgrading second batch to new version: 1fb64fea4d07dee292ba2d834d3af122d7da7094
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9873.05 txn/s, latency: 3420.86 ms, (p50: 3700 ms, p70: 3800, p90: 3900 ms, p99: 4000 ms), latency samples: 325640
5. check swarm health
Compatibility test for dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862 ==> 1fb64fea4d07dee292ba2d834d3af122d7da7094 passed
Test Ok

github-actions · 2026-07-02T06:53:29Z

✅ Forge suite `realistic_env_max_load` success on `1fb64fea4d07dee292ba2d834d3af122d7da7094`

two traffics test: inner traffic : committed: 15609.19 txn/s, latency: 1097.52 ms, (p50: 1000 ms, p70: 1100, p90: 1300 ms, p99: 1700 ms), latency samples: 5828740
two traffics test : committed: 100.00 txn/s, latency: 846.50 ms, (p50: 800 ms, p70: 900, p90: 1000 ms, p99: 1200 ms), latency samples: 1660
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.317, avg: 0.264", "ConsensusProposalToOrdered: max: 0.114, avg: 0.108", "ConsensusOrderedToCommit: max: 0.211, avg: 0.165", "ConsensusProposalToCommit: max: 0.312, avg: 0.273"]
Max non-epoch-change gap was: 2 rounds at version 47421 (avg 0.00) [limit 4], 1.10s no progress at version 47421 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.59s no progress at version 2382037 (avg 0.59s) [limit 16].
Test Ok

github-actions · 2026-07-02T06:59:57Z

✅ Forge suite `framework_upgrade` success on `dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862` ==> `1fb64fea4d07dee292ba2d834d3af122d7da7094`

Compatibility test results for dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862 ==> 1fb64fea4d07dee292ba2d834d3af122d7da7094 (PR)
Upgrade the nodes to version: 1fb64fea4d07dee292ba2d834d3af122d7da7094
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2451.19 txn/s, submitted: 2456.17 txn/s, failed submission: 4.98 txn/s, expired: 4.98 txn/s, latency: 1221.41 ms, (p50: 1200 ms, p70: 1200, p90: 1800 ms, p99: 2500 ms), latency samples: 216540
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2372.74 txn/s, submitted: 2379.98 txn/s, failed submission: 7.24 txn/s, expired: 7.24 txn/s, latency: 1227.57 ms, (p50: 1200 ms, p70: 1200, p90: 1800 ms, p99: 2400 ms), latency samples: 209662
5. check swarm health
Compatibility test for dfffef35ad9ab37b8eb9d9cc94d8f97911c4d862 ==> 1fb64fea4d07dee292ba2d834d3af122d7da7094 passed
Upgrade the remaining nodes to version: 1fb64fea4d07dee292ba2d834d3af122d7da7094
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2235.03 txn/s, submitted: 2241.23 txn/s, failed submission: 6.20 txn/s, expired: 6.20 txn/s, latency: 1288.31 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 2600 ms), latency samples: 201860
Test Ok

wqfish force-pushed the pr19989 branch from 0a99ab9 to 54e57e0 Compare June 11, 2026 01:28

wqfish changed the title ~~[execution] VM-boundary hot-state access summary~~ [vm] Record txn read sets for hot state promotion Jun 11, 2026